Search CORE

37 research outputs found

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

Author: Bunescu R. C.
Cheng X.
Cucerzan S.
He Z.
Recht B.
Rizzo G.
Spitkovsky V. I.
Yedidia J.
Publication venue
Publication date: 29/01/2016
Field of study

Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Query Suggestion and Data Fusion in Contextual Disambiguation

Author: Cucerzan S.
Friedman J. H.
Mihalkova L.
Shaw J. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Fifty years of spellchecking

Author: Blair CR
Brooks G
Carlson AJ
Cucerzan S
Damerau FJ
Damerau FJ
Golding AR
Golding AR
Leech G
Levenshtein VI
McIlroy MD
Mihov S
Mitton R
Mitton R
Mitton R
Mitton R
Morris R
Oflazer K
Pedler J
Peterson JL
Peterson JL
Pollock JL
Roger Mitton
Savary A
Sterling CM
Veronis J
Wagner RA
Wing AM
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

A short history of spellchecking from the late 1950s to the present day, describing its development through dictionary lookup, affix stripping, correction, confusion sets, and edit distance to the use of gigantic databases

Crossref

Birkbeck Institutional Research Online

Entity linking for biomedical literature

Author: AB Abacha
B Liu
Boliang Zhang
CE Shannon
D Milne
Daniel Howsmon
Deborah McGuinness
H Fang
H Ji
H Ji
Heng Ji
HJ Dai
J Biesiada
J Zheng
James Hendler
Jin G Zheng
Juergen Hahn
L Hirschman
L Hunter
L Page
L Ratinov
LM Akella
M Frisch
M Miwa
M Pennacchiotti
NDB Bruce
P Ferragina
PN Mendes
R Mihalcea
S Cucerzan
S Kulkarni
T Cassidy
T Lipniacki
V Punyakanok
W Shen
X Cheng
X Liu
Y Guo
Y Sun
Y Usami
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

X-LiSA

Author: Cucerzan S.
Han X.
Sorg P.
Zhang L.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Searching Locally-Defined Entities

Author: Burges C. J. C.
Cucerzan S.
Lin J.
Robertson S. E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Web scale taxonomy cleansing

Author: Bunescu R.
Carlson A.
Chaudhuri S.
Cohen W. W.
Cucerzan S.
Monge A.
Song Y.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

AIDA

Author: Auer S.
Bunescu R.C.
Cucerzan S.
Hoffart J.
Hoffart J.
Kulkarni S.
Milne D.N.
Singla P.
Suchanek F.M.
Wick M.L.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Enriching Query Flow Graphs with Click Information

Author: E. Agichtein
F. Silvestri
M.-D. Albakour
P. Boldi
R.W. White
R.W. White
R.W. White
S. Cucerzan
T. Joachims
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The increased availability of large amounts of data about user search behaviour in search engines has triggered a lot of research in recent years. This includes developing machine learning methods to build knowledge structures that could be exploited for a number of tasks such as query recommendation. Query flow graphs are a successful example of these structures, they are generated from the sequence of queries typed in by a user in a search session. In this paper we propose to modify the query flow graph by incorporating clickthrough information from the search logs. Click information provides evidence of the success or failure of the search journey and therefore can be used to enrich the query flow graph to make it more accurate and useful for query recommendation. We propose a method of adjusting the weights on the edges of the query flow graph by incorporating the number of clicked documents after submitting a query. We explore a number of weighting functions for the graph edges using click information. Applying an automated evaluation framework to assess query recommendations allows us to perform automatic and reproducible evaluation experiments. We demonstrate how our modified query flow graph outperforms the standard query flow graph. The experiments are conducted on the search logs of an academic organisation's search engine and validated in a second experiment on the log files of another Web site. © 2011 Springer-Verlag Berlin Heidelberg

University of Essex Research Repository

Crossref

Open Research Online (The Open University)

Archivio della Ricerca - Università di Roma 3

Joint Institute for Nuclear Research (JINR)

Lancaster E-Prints

Robust and Collective Entity Disambiguation through Semantic Embeddings

Author: Alhelbawy A.
Bunescu R.
Cheng X.
Cucerzan S.
He Z.
Le Q. V.
Mikolov T.
Ratinov L.
Röder M.
Sojka P.
Zwicklbauer S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Seman-tic Search, Reasoning and Question & Answering. We propose a new collective, graph-based disambiguation algorithm utilizing semantic entity and document embeddings for robust entity disam-biguation. Robust thereby refers to the property of achieving better than state-of-the-art results over a wide range of very different data sets. Our approach is also able to abstain if no appropriate entity can be found for a specific surface form. Our evaluation shows, that our approach achieves significantly (>5%) better results than all other publicly available disambiguation algorithms on 7 of 9 datasets without data set specific tuning. Moreover, we discuss the influence of the quality of the knowledge base on the disambigua-tion accuracy and indicate that our algorithm achieves better results than non-publicly available state-of-the-art algorithms

Crossref

University of Twente Research Information